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^ ; TWO LIKELIHOOD-BASED SEMIPARAMETRIC ESTIMATION 

O ■ METHODS FOR PANEL COUNT DATA WITH COVARIATES 

o . 

^ ■ By Jon A. Wellner 1 and Ying Zhang 

dj \ University of Washington and University of Iowa 

. We consider estimation in a particular semiparametric regression 

I/") ' model for the mean of a counting process with "panel count" data. 

The basic model assumption is that the conditional mean function 

Hof the counting process is of the form E{N(t)\Z} = exp(/3(f Z)Ao(t) 
where Z is a vector of covariates and Ao is the baseline mean func- 
\ , tion. The "panel count" observation scheme involves observation of 

^ C"| the counting process N for an individual at a random number K of 

' random time points; both the number and the locations of these time 

points may differ across individuals. 

We study semiparametric maximum pseudo-likelihood and max- 
imum likelihood estimators of the unknown parameters (/3o,Ao) de- 
^s^j . rived on the basis of a nonhomogeneous Poisson process assumption. 

' The pseudo-likelihood estimator is fairly easy to compute, while the 

£Nj , maximum likelihood estimator poses more challenges from the corn- 

er 1 putational perspective. We study asymptotic properties of both esti- 

mators assuming that the proportional mean model holds, but drop- 
ping the Poisson process assumption used to derive the estimators. 
In particular we establish asymptotic normality for the estimators 
of the regression parameter flo under appropriate hypotheses. The 
results show that our estimation procedures are robust in the sense 
that the estimators converge to the truth regardless of the underlying 

i 

^ , counting process. 
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1. Introduction. Suppose that N = {N(t) :t > 0} is a univariate count- 
ing process. In many applications, it is important to estimate the expected 
number of events E{N(t)\Z} which will occur by the time t, conditionally 



■ on a covariate vector Z. 
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In this paper we consider the proportional mean regression model given 

by 

(1.1) A(t\Z) = E{N(t)\Z} = e^ z Ao(t), 

where Ao is a monotone increasing baseline mean function. The parameters 
of primary interest are (5q and Ao. 

Suppose that we observe the counting process N at a random number K of 
random times = Tk,q < Tjc,x < • • • < Tk,k- We write T K = (Tk,i, ■ ■ ■ , Tk,k), 
and we assume that (K,T_ K \Z) ~ G(-\Z) is conditionally independent of the 
counting process N given the covariate vector Z. We further assume that 
Z ~ H on W 1 with some mild conditions on H for the identifiability of our 
semiparametric regression model given in Section 3. 

The observation for each individual consists of 

(1.2) X = (Z, K,Tk, N(1x,i), . . . , N(Tk,k)) = (Z, K,T K ,N K )- 

This type of data is referred to as panel count data. Throughout this manuscript, 
we will assume that we observe n i.i.d. copies, X\, . . . ,X n , of X. 

Panel count data arise in many fields including demographic studies, in- 
dustrial reliability and clinical trials; see, for example, Kalbfleisch and Law- 
less [9], Gaver and O'Muircheataigh [5], Thall and Lachin [16], Thall [15], 
Sun and Kalbfleisch [13] and Wellner and Zhang [21], where the estimation 
of either the intensity of event recurrence or the mean function of a counting 
process with panel count data was studied. Many applications involve covari- 
ates whose effects on the underlying counting process are of interest. While 
there is considerable work on regression modeling for recurrent events based 
on continuous observations (see, e.g., Lawless and Nadeau [10], Cook, Law- 
less and Nadeau [3] and Lin, Wei, Yang and Ying [11]), regression analysis 
with panel count data for counting processes has just started recently. Sun 
and Wei [14] and Hu, Sun and Wei [6] proposed estimating equation meth- 
ods, while Zhang [25, 26] proposed a pseudo-likelihood method for studying 
the proportional mean model (1.1) with panel count data. 

To derive useful estimators for this model we will often assume, in ad- 
dition to (1.1), that the counting process N, conditionally on Z, is a non- 
homogeneous Poisson process. But our general perspective will be to study 
the estimators and other procedures when the Poisson assumption may be 
violated and we assume only that the proportional mean assumption (1.1) 
holds. Such a program was carried out by Wellner and Zhang [21] for esti- 
mation of Ao without any covariates for this panel count observation model. 

The outline of the rest of the paper is as follows. In Section 2 we describe 
two methods of estimation, namely maximum pseudo-likelihood and maxi- 
mum likelihood estimators of (Po,Aq). The basic picture is that the pseudo- 
likelihood estimator is computationally relatively straightforward and easy 
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to implement, while the maximum likelihood estimators are considerably 
more difficult, requiring an iterative algorithm in the computation of the pro- 
file likelihood. In Section 3, we state the main asymptotic results: strong con- 
sistency, rate of convergence and asymptotic normality of /3^ s and (3 n , for the 
maximum pseudo-likelihood and maximum likelihood estimators (f3P s ,Af l s ) 
and (P n , A n ) of (/?o, Ao) assuming only the proportional mean structure (1.1), 
but not assuming that N is a Poisson process. These results are proved in 
Section 5 by use of tools from empirical process theory. Although pseudo- 
likelihood methods have been studied in the context of parametric models 
by Lindsay [12] and Cox and Reid [4], not much seems to be known about 
their behavior in nonparametric and semiparametric settings such as the 
one studied here, even assuming that the base model holds. In Section 4 we 
present the results of simulation studies to demonstrate the robustness of the 
methods and compare the relative efficiency of the two methods. An applica- 
tion of our methods to a bladder tumor study is presented in this section as 
well. A general theorem concerning asymptotic normality of semiparametric 
M-estimators and a technical lemma upon which the proofs of our main 
theorems rely, are stated and proved in Sections 6 and 7, respectively. 

2. Two likelihood-based semiparametric estimation methods. 

Maximum pseudo-likelihood estimation. To derive our estimators we as- 
sume that conditionally on Z, N is a nonhomogeneous Poisson process with 
mean function given by (1.1). The pseudo- likelihood method for this model 
uses the marginal distributions of N, conditional on Z, 

P(N(t) = k\Z) = M^ex P (-A(t|Z)), 

and ignores dependence between N(ii), N(t2) to obtain the log pseudo- 
likelihood: 

n Ki 

to, a) =EE{ N(i) ( r Sj) lo g A ( T Sj) 
i=ij=i 

+ N«(T» i )/3^ i -e^A(r£ i )}. 

Let 1Z C M. d be a bounded and convex set, and let J- be the class of functions 

(2.1) JF = {A:[0, oo) — > [0, oo)|A is monotone nondecreasing, A(0) = 0}. 

Then the maximum pseudo-likelihood estimator (/3^ S ,A^ S ) of (/?o,Ao) is 
given by (/3^ s ,Ap s ) = argmax( / g j A)e'R.x;F^n S (/3i A). This can be implemented 
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in two steps via the usual profile (pseudo-)likelihood. For each fixed value 
of P we set 

(2.2) An.,/3)^argmaxW,A) 

and define P n s ^° mc {p) = Z£ s (/3, {;&)). Then fy 3 = argmax^ ^P rofilo (/?) 
and A? s = A£ s (-,/^). Note that 1% S (P,A) depends on A only at the observa- 
tion time points. By convention, we define our estimator Aff to be the one 
that has jumps only at the observation time points to insure uniqueness. 

The optimization problem in (2.2) is easily solved and the details of the 
solution can be found in Zhang [26]. 

Maximum likelihood estimation. Under the assumption that condition- 
ally on Z, N is a nonhomogeneous Poisson process, the likelihood can be 
calculated using the (conditional) independence of the increments of N, 
AN(s,t] = N(t) — N(s), and the Poisson distribution of these increments, 

P(AN( S ,i] = k\Z) = [AA((S fc! t]|Z)] " exp(-AA(( g ,t]|Z)), 
to obtain the log-likelihood 

n Ki 

l n (P, A) = EEi AN Si • l°g AA Ki j + AN%P T Zi - eP TZi AA Kij }, 
i=i i=i 

where 

AN Kj = N{T K j) - HTkj-i), j = 1,...,K, 
AA Kj = A(Tjcj) - A(Tjf j = 1, . . . , K. 

Then (/3 n ,A n ) = argmax^ ^)e7?.x^ : '^n(/3) A). This maximization can also be 
carried out in two steps via profile likelihood. For each fixed value of (5 we set 
A n (-,/3) = argmax Ae ^ /„(/?, A), and define (P rofilc (/3) = l n (J3, A n (-,/3)). Then 
n = arg max^gTj /P rofile (/3) and A n = A n (-, (5 n ). Similarly, the estimator A n 
is defined to have jumps only at the observation time points. To compute 
the estimate (/3 n ,A n ), we adopt a doubly iterative algorithm to update the 
estimates alternately. The sketch of the algorithm consists of the following 
steps: 

51. Choose the initial = , the maximum pseudo-likelihood estimator. 

52. For given p(p) (p = 0, 1, 2, . . .), the updated estimate of A , A^, is com- 
puted by the modified iterative convex minorant algorithm proposed by 
Jongbloed [8] on the likelihood l n (P^ p \ A). Initialize this algorithm using 
A^" 1 ) and stop the iteration when 

In (P^ ; A new ) l n (p(P\ A currc 



/n(/3W,A 



current ) 



< rj. 
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In the very first step, we choose the starting value of A by interpolating 
A^ s linearly between two adjacent jump points to make it monotone 
increasing and so the likelihood l n ((3,A) is well defined. 

53. For given A^ p \ the updated estimate of (3, f3^ p+l \ is obtained by op- 
timizing Z„(/3,AW) using the Newton-Raphson method. Initialize the 
algorithm using j3^ and stop the iteration when ||/3 new — A:urrent||oo < V- 

54. Repeat steps 2 and 3 until the following convergence criterion is satisfied: 

l n {^ +l \K^)-Ul5^\k^) 

As in the case of pseudo-likelihood studied in Zhang [26], it is easy to 
verify that for any given monotone nondecreasing function A, the likelihood 
l n (j3,A) is a concave function of the regression parameter j3 with a negative 
definite Hessian matrix. Using this fact, we can easily show that the iteration 
process increases the value of the likelihood, that is, Z„(/? {p+1) , A^ p+1) ) - 
ln(0<P\ A®)>Q, for p = 0,1,.... 

The iterative algorithms proposed via the profile pseudo-likelihood or the 
profile likelihood approach converge very well and the convergence does not 
seem to be affected by the starting point in our simulation experiments de- 
scribed in Section 4. However, this algorithm is not efficient, especially for 
the maximum likelihood estimation method. It generally needs a consider- 
able number of iterations to achieve the convergence criterion as stated in 
S4. Meanwhile, computing the profile estimator A n given in S2 involves the 
modified iterative convex minorant algorithm which also needs a large num- 
ber of iterations to converge with the criterion stated in S2. Our simulation 
experiment with sample size of n = 100 shows that computing the maximum 
likelihood estimator with rj = 10 -10 needs about 1800 minutes to converge on 
a PC (Intel Xeon CPU 2.80 GHz) with the algorithm coded in R. Compared 
to the profile likelihood algorithm, the profile pseudo-likelihood algorithm is 
computationally less demanding, since the profile pseudo estimator A ps has 
an explicit solution, as shown in Zhang [26], and hence does not involve any 
iteration. As result, computing the maximum pseudo-likelihood estimator is 
much faster than computing the maximum likelihood estimator. 

3. Asymptotic theory: results. In this section we study the properties of 
the estimators (/3^ S ,A^ S ) and (f3 n ,A n ). We establish strong consistency and 
derive the rate of convergence of both estimators in some Z/2-metrics related 
to the observation scheme. We also establish the asymptotic normality of 
both (3 ps and f3 n under some mild conditions. 

First we give some notation. Let Bj and B denote the collection of Borel 
sets in R d and R, respectively, and let Bi[0,r] = {B n [0, r] : B £ B} and 
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^2[0,t] =£>i[0, t] x 23i[0,t]. On ([0,t],£>i[0,t]) we define measures fii, /i 2 , 
v±, V2 and 7 as follows: for B,B\,B2 £ Bi[0, r] and C £ Bj, set 



. 00 

v x (B x C) = / V P(K = A;|Z 



x p ( T k,j e 5 1 if = k, Z = z) dH(z), 

5=1 



/y 2 



y P(K = /c|Z = z) 



x y PCTfcj-! £ Si, T kd £B 2 \K = k,Z = z) dH{z) 
i=i 



and 

7 



„ 00 

(5) = / V" p<K = k\Z = z)P(T k k £ B|ljf = fc, Z = z) dH(z). 



We also define the L 2 -metrics d\(9\, 62) and d 2 (#i, 62) in the parameter space 
6 = K x T as 

di(fli,e 2 ) = -/? 2 | 2 + ||Ai - A 2 ||| 2(w) } 1/2 , 

MOuh) = {\(3i - f3 2 \ 2 + ||AAi - AA 2 ||| 2(M2) } 1/2 , 



where m{B) = ^(Bx R d ) and /x 2 (5i x 5 2 ) = z/ 2 (#i xB 2 x To establish 
consistency, we assume that: 

CI. The true parameter #0 = (A)>Ao) £ ft° x J 7 where 72.° is the interior of 
ft. 

C2. For all j = 1, . . . , K, K = 1, 2, . . . , the observation times Tka are random 
variables taking values in the bounded interval [0, r] for some r £ (0, 00). 
The measure ^ x H on ([0,1"]' x R d ,,6/[0, t] x 6^) is absolutely contin- 
uous with respect to vi for I = 1, 2, and E(K) < 00. 

C3. The true baseline mean function Ao satisfies Ao(r) < M for some M £ 
(0,oo). 

C4. The function M£ s defined by M ps (X) = EjLi Njg log(N*j) satisfies 
PM^ S (1)< 00. 

C5. The function M defined by M (X) = Y,f=i &®Kj log(ANjij) satisfies 
FM (I) < 00. 

C6. Z = supp(-ff), the support of H, is a bounded set in W 1 . [Thus there 

exists zq > such that P(\Z\ < zq) = 1.] 
C7. For all a £ R d , a / 0, and c £ M, P(a T Z / c) > 0. 
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Condition C7 is needed together with m x H <C v\ from C2 to establish 
identifiability of the semiparametric model. 

Theorem 3.1. Suppose that conditions C1-C7 hold and the conditional 
mean structure of the counting process N is given by (1.1). Then for every 
b < r for which fii([b, r]) > 0, 

di((/3f,A^l [0 , 6 ]),(/3o,Aol[o,6]))^0 a.s. as rw oo. 

In particular, if fJ>i({r}) > , i/ien 

di((^,A^ s ),(/3o,A ))^0 a.s. as n-oo. 

Moreover, for every b < r /or which j{[b, r]) > 0, 

<M(/3n,A n l[ 0) 6]),(/?o,Aol[o,f>])) ^° a - s - as^^oo. 
In particular, i/7({r})>0, i/ten 

^2((/3n,A n ),(/3 ,A ))->-0 a.s. asrwoo. 

Remark 3.1. Some condition along the lines of the absolute continuity 
part of C2 is needed. For example, suppose that Ao(t) = t 2 , /3 = 0, A(t) = i 
and (3 = 1. Then if we observe at just one time point T (so K = 1 with 
probability 1), and T = e z with probability 1, then A (T)e /3 ° z = A(T)e /3Z 
almost surely and the model is not identifiable. C2 holds, in particular, if 
(K,Tk) is independent of Z. The conditions on the measure fi2 x in C2 
and C5 are not needed for proving consistency of = (/3^ s , A^ s ), while the 
conditions on the measure [i\ x H in C2 and C4 are not needed for proving 
consistency of 9 n = (f3 n ,A n ). 

To derive the rate of convergence, we also assume that: 

C8. For some interval 0[T] = [a, r] with a > and Ao(cr) > 0, 

p(n? =1 {T KJ e[j,T]})=i. 

C9. P(K < k ) = 1 for some A; < oo. 
CIO. For some vq 6 (0, oo) the function Z i— > £)(e VoN ^ r ) \Z) is uniformly bounded 
for Z £ Z. 

Cll. The observation time points are so-separated: that is, there exists a 
constant sq > such that P(Tkj — Tkj-i > so for all j = 1, . . . , K) = 
1. Furthermore, /ii is absolutely continuous with respect to Lebesgue 
measure A with a derivative jx\ satisfying fii(t) >cq>0 for some pos- 
itive constant cq. 

C12. The true baseline mean function Ao is differentiable and the derivative 
has positive and finite lower and upper bounds in the observation 
interval, that is, there exists a constant < /o < oo such that 1/ fo < 
A' (t)<f <oo for teO[T]. 
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C13. For some rj G (0, 1), a T \ai{Z\U)a > na T E(Z Z T \U)a a.s. for all a <E R d , 

where (U,Z) has distribution z>i/z^i(K + x Z). 
C14. For some rj £ (0,1), a T Vax(Z\U, V)a > rja T E(ZZ T \U,V)a a.s. for all 

a £ R d , where (U, V, Z) has distribution v 2 /v2(M +2 X Z). 

Theorem 3.2. In addition to the conditions required for the consistency, 
suppose C8, C9, CIO and C13 hold with the constant vq in CIO satisfying 

v > 4jfeo(l + 5% s ) 2 with 5% s = ^coAg(<r)/(24-8/o) and Mi(M) > 0. T/ien 



Moreover, if conditions Cll, C12, and C14 /zoZd along with the conditions 
listed above but with the constant vq in CIO satisfying vq > 4fco(l + 5o) 2 with 



Remark 3.2. Conditions C8, C9, CIO, Cll and C12 are sufficient for 
validity of Theorem 3.2, but they are probably not necessary. Conditions 
C9 and CIO are mainly used in deriving the rate of convergence when the 
counting process N is allowed to be general [but satisfying the mean model 
(1.1)]. C8 says that all the observations should fall in a fixed interval in 
which the mean function is bounded away from zero and C9 indicates that 
the number of observations is bounded. These conditions are generally true 
in clinical applications. Condition CIO holds for all vq > 0, if the counting 
process is uniformly bounded (which can be justified in many applications) 
or forms a Poisson process, conditionally on covariates. The first part of Cll 
requires that two adjacent observation times should be at least so apart, 
an assumption which is very reasonable in practice. The second part of 
Cll implies that the "total observation measure" \x\ has a strictly positive 
intensity (or density). C12 requires that the true baseline mean function 
should be absolutely continuous with bounded intensity function. While C12 
is a reasonable assumption in practice, it may be stronger than necessary. 
We assume C12 mainly for technical convenience in our proofs. 

Remark 3.3. The metrics d\ and d 2 are closely related. Since X)j=i( a i — 

b j) 2 < k2 T. k j=i{{aj ~ Oj_i) - (bj - bj-x)} 2 (see Wellner and Zhang [20] for 
a proof), the two metrics are equivalent under C9 and therefore the consis- 
tency and rate of convergence results for the maximum likelihood estimator 
([3 n , An) hold under the metric d\ as well. 



n 



/ 3 d 1 ((/3r,AD,(/3o,A )) = P (l). 




Remark 3.4. Condition C13 can be justified in many applications. 
By the Markov inequality, it is easy to see that condition C7 implies that 
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E(ZZ T ) is a positive definite matrix. Let E\ and Vari denote expectations 
and variances under the probability measure x Z). If 

we assume that \ai\{Z\U) is a positive definite matrix, and we set Ai = 
max{eigenvalue(i?i(ZZ T |C/)} and A^ = nun{eigenvalue(Vari(Z|[/))}, then 
< A^ < Ai. Therefore, for any a G M d , 

A* A* 

a T Var 1 (Z\U)a>a T X* d a=-^a T X 1 a>-^a T E 1 (ZZ T \U)a. 

Ai Ai 

Thus, condition C13 holds by taking r/ < A^/Ai. Note that both Ai and A^ 
depend on U in general and the argument here works assuming that this 
ratio has a positive lower bound uniformly in U. We can justify C14 similarly. 

Although the overall convergence rate for both the maximum pseudo and 
likelihood estimators is of the order n _1//3 , the rate of convergence for the 
estimators of the regression parameter, as usual, may still be n -1 / 2 . Similar 
to the results of Huang [7] for the Cox model with current status data, we 
can establish asymptotic normality of both (5^ s and (3 n . 

Theorem 3.3. Under the same conditions assumed in Theorem 3.2, the 
estimators /3^ s and (3 n are asymptotically normal, 

(3.1) Vn~0 n -/3 )->Z~ N d (0,A~ 1 B(A~ 1 ) T ) 



(I 



and 



(3.2) v^(/T - A)) -? Z ps ~ N d (0, (A ps )~ 1 B ps ((A ps )~ 1 ) T ), 

a 

where 



B 




[ E C j>f (Z)[Z- 


A 




\f:AA 0Kj efi z [Z 
[j=i 


B ps 




\ E c !:A z )l z - 




-1 


\f:A 0Kj e^[Z- 
lj=i 



[Z - R{K,T K>j ^T KJ )f 



in which R(K,T Kd ,T KJ ,) = E(Ze^ z \K,T Kj ,T KJI )/E{e^ z \K,T Kd ,T KJI ), 
RP S (K,T KJ ) = E{Ze^ z \K,T K j)/E{efi z \K,T Kd ), C j)f (Z) = Cov[AN^, 
m Kf \Z,K,T K \, C^,(Z) = Cov[n(T Kj ),n(T Kr )\Z,K,T Kj ,T KJ ,}, A 0Kj = 
Ao(T KJ ) and AA 0tK j = A (T K ,j) - A (T^j_i). 
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If the counting process is, conditionally given Z, a nonhomogeneous Pois- 
son process with conditional mean function given as specified, then Ca f{Z) = 

AAoKjeP° z l{j = j'}- It follows that B = A = I((3q), the information ma- 
trix computed in Wellner, Zhang and Liu [23], and hence A~ 1 B{A~ 1 ) T = 
I -1 (/3o). This implies that the estimator (3 n under the conditional Poisson 
process is asymptotically efficient. However, since C^ S -,{Z) = e^o z A K(j/\j') ; 
B ps ^ A ps . This shows that the semiparametric maximum pseudo-likelihood 
estimator f3 ps will not be asymptotically efficient under the Poisson assump- 
tion. 

There is, however, a natural "Poisson regression" model for which the 
maximum pseudo-likelihood estimator is asymptotically efficient: if we sim- 
ply assume that the conditional distribution of (N(Tk,i), . . . , N(Tk,k)) given 
(K, Tk,i, ■ • ■ , Tk,k, Z) is that of a vector of independent Poisson random vari- 
ables with means given by A(Tkj\Z) = exp(/3j Z)Ao(Trtj) for j = 1, . . . ,K, 
then 

C] S .,{Z) = Cw[N(T Kd ),N(T Kd ,)\Z,K,T Ktj ,T Kd ,] = A(T KJ \Z)l{j = j'}. 

Hence B ps = A ps = ip issRegr(A)) and (3 ps is asymptotically efficient for this 
alternative model. In practice, this occurs when (N(Tk,i),N(Tk,2), ■ ■ ■ ,N(Tk,k)) 
consist of cluster Poisson count data in which the counts within a cluster 
are independent. 

4. Numerical results. 

4.1. Simulation studies. We generated data using the same schemes as 
those given in Zhang [26]. Monte Carlo bias, standard deviation and mean 
squared error of the maximum pseudo-likelihood and maximum likelihood 
estimates are then compared. 

Scenario 1. In this scenario, the data is {(Zi,Ki,T$.,B$.) :i = 1,2, . . . ,n} 
with Z{ = (Zi ; ±, Zi ; 2, Zi^) where, conditionally on (Zi,Ki,T K ,), the counts 

(i) 

N)^. were generated from a Poisson process. For each subject, we gener- 
ate data by the following scheme: Z^i ~ Unif(0, 1), Zip, ~ iV(0, 1), ~ 
Bernoulli (0.5); Ki is sampled randomly from the discrete set, {1,2, 3, 4, 5, 6}. 

Given K h T§. = (T® 

l; 2i ■ ■ ■ i k) are t ne order statistics of Ki ran- 
dom observations generated from Unif(l, 10) and rounded to the second 
decimal point to make the observation times possibly tied. The panel counts 

N« = (NW(TW J,n«(t£ 2 ), . . . ,N«(T« k )) 

are generated from the Poisson process with the conditional mean function 
given by A{t\Zi) = 2t exp(/?o"Zj), that is, 

N®(T$j) - NW(4^_i) ~ Pois S on{2{T^\. - if^) exp(/#Z,)}, 
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where fa = (/3i,/3 2 ,/3 3 ) T = (-1.0,0.5, 1.5) T . 

For this scenario, we can directly calculate the asymptotic covariance ma- 
trices given in Theorem 3.3, Z ps = (A^y 1 B ps '((A^)- 1 )? = (1582/17787) W~ 
and £ = A" 1 5(^- 1 ) T = A^ 1 = (1260/19179) W" 1 , respectively, where W = 
E{e l3 o z [Z-E(Ze l3 o z )/E(e (3 o z )}^ 2 }. Since it is difficult to evaluate the ma- 
trix W analytically, we calculated it numerically using Mathematica (Wol- 
fram [24]) to obtain the following approximate results for the asymptotic 
covariance matrices: 



We conducted simulation studies with sample sizes of n = 50 and n = 100, 
respectively. For each case, the Monte Carlo sample bias, standard deviation 
and mean squared error for the semiparametric estimators of the regression 
parameters are reported in Table 1. We also include the asymptotic standard 
errors obtained from (4.1) and (4.2) in Table 1 to compare with the Monte 
Carlo sample standard deviations. The results show that the sample bias 
for both estimators is small, the standard deviation and mean squared error 
are smaller for the maximum likelihood method compared to the pseudo- 
likelihood method and the latter decrease as n~ 1 / 2 and n -1 , respectively, 
as sample size increases. Moreover, the standard errors of estimates based 
on asymptotic theory are all close to the corresponding standard deviations 
based on the Monte Carlo simulations. All of these provide numerical sup- 
port for our asymptotic results in Theorem 3.3. 

Based on the results of 1000 Monte Carlo samples, we plot the pointwise 
means and 2.5- and 97.5-percentiles of both estimators of the baseline mean 
function A(t) = 2t in Figure 1. It clearly shows that both estimators seem 
to have negligible bias and the maximum likelihood estimator has smaller 
variability compared to the maximum pseudo- likelihood estimator. When 
sample size increases, the variability of both estimators decreases accord- 
ingly. 

Scenario 2. In this scenario, the data is {{Z^K^T^.,^) :i = 1,2, . . . ,n} 

with Z{ = (Zi^ijZi^, Zip) and, conditionally on (Zj, Ki,Tj).), the counts 

were generated from a mixed Poisson process. For each subject, (Zi, Ki,T^)_) 
are generated in exactly the same way as in Scenario 1. The panel counts 



(4.1) 




and 



(4.2) 
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Table 1 



Results of the 


Monte Carlo simulation studies for the regression parameter estimates 


based on 1000 


repeated samples for 


data generated from 


the conditional Poisson process 






= 50 


n 


= 100 




Pseudo- 




Pseudo- 






likelihood 


Likelihood 


likelihood 


Likelihood 


Estimate of /3i 










BIAS 


0.0020 


0.0018 


0.0017 


0.0015 


SD 


0.1193 


0.1019 


0.0806 


0.0694 


ASE 


0.1069 


0.0919 


0.0758 


0.0649 


MSE x 10 2 


1.4236 


1.0387 


0.6499 


0.4819 


Estimate of 










BIAS 


-0.0003 


-0.0016 


0.0028 


0.0022 


SD 


0.0349 


0.0294 


0.0231 


0.0193 


ASE 


0.0301 


0.0259 


0.0213 


0.0183 


MSE x 10 2 


0.1218 


0.0867 


0.0541 


0.0377 


Estimate of /?3 










BIAS 


0.0023 


0.0011 


0.0016 


-0.0009 


SD 


0.0830 


0.0712 


0.0579 


0.0497 


ASE 


0.0779 


0.0670 


0.0551 


0.0474 


MSE x 10 2 


0.6894 


0.5071 


0.3355 


0.2471 



are, however, generated from a homogeneous Poisson process with a random 
effect on the intensity: given subject i with covariates Z{ and frailty variable 
on (independent of Zi), the counts are generated from the Poisson process 
with intensity (A + ai) exp(/3o"Zj), where A = 2.0 and a, £ {—0.4, 0, 0.4} with 
probabilities 0.25, 0.5 and 0.25, respectively. 

In this scenario, the counting process given only the covariates is not a 
Poisson process. However, the conditional mean function of the counting 
process given the covariates still satisfies (1.1) with Ao(t) = It and thus 
our proposed methods are expected to be valid for this case as well. The 
asymptotic variances given in Theorem 3.3 for this scenario are 



YP S = (A ps )~ 1 B ps ((A ps )' 1 ) T = ^-W~ l + ^^-W~ l W(W^ T 
v ) w i i 17787 1778? 

and 



S = A~ 1 B(A~ l ) T = ^W- 1 + I^I^W-'WiW-Y, 
v ' 19179 19179 2 v ' 

respectively, where W = E{e 2 ^ z [Z - E{Ze^ z ) /E{e^ z )f> 2 }. Using Math- 
ematica (Wolfram [24]) to calculate the asymptotic covariance matrices nu- 
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Scenario 1: n = 50 



Scenario 1: n =100 




2 




Observation Time 



Scenario 2: n = 50 



Scenario 2: n = 100 





Observation Time 



Fig. 1. The pointwise means, 2. 5 -percentiles and 97.5-percentiles of both the maximum 
pseudo-likelihood and likelihood estimators of the baseline mean function under the propor- 
tional mean model. MPLE: The maximum pseudo-likelihood estimator; MLE: The maxi- 
mum likelihood estimator. 



merically yields 

(4.3) £ ps 
and 

(4.4) S » 



1.172450 
-0.023852 
-0.043178 

0.918986 
-0.019718 
-0.035696 



-0.023852 
0.108760 
0.022975 

-0.019718 
0.085924 
0.018994 



-0.043178' 
0.022975 
0.448444 

-0.035696 ' 
0.018994 
0.343985 



As in Scenario 1, we conducted simulation studies with sample sizes of 
n = 50 and n = 100, respectively. For each case, the Monte Carlo sample 
bias, standard deviation and mean squared error for the semiparametric 
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estimators of the regression parameters are computed with 1000 repeated 
samples. The results are shown in Table 2. In Figure 1, we also plot the 
pointwise means, 2.5-percentiles and 97.5-percentiles of both estimators of 
the unconditional baseline mean function Ao (t) = 2t based on the results ob- 
tained from 1000 Monte Carlo samples. We observe the same phenomenon 
as appeared in Scenario 1: for the regression parameters, both standard 
deviation and mean squared error using the maximum likelihood method 
are smaller than those using the pseudo-likelihood method while the bias is 
relatively small; for the baseline mean function, both estimators have a neg- 
ligible bias but the maximum likelihood estimator has less variability than 
the maximum pseudo-likelihood estimator. We also note that the variabil- 
ity results of the semiparametric estimators are relatively larger than their 
counterpart in Scenario 1. This may be caused by violation of the assumption 
of a conditional Poisson process given only the covariates. We also include 
the asymptotic standard errors of the regression parameter estimates based 
on (4.3) and (4.4) in Table 2. Again the standard errors derived from the 
asymptotic theory are all close to the standard deviations based on Monte 
Carlo simulations. 

These simulation studies provide numerical support for the statement that 
the proposed semiparametric estimation methods are robust against the un- 
derlying conditional Poisson process assumption. These methods are valid 

Table 2 

Results of the Monte Carlo simulation studies for the regression parameter estimates 
based on 1000 repeated samples for data generated from the mixed Poisson process 



n = 50 n = 100 





Pseudo- 
likelihood 


Likelihood 


Pseudo- 
likelihood 


Likelihood 


Estimate of /3i 










BIAS 


0.0038 


0.0029 


-0.0068 


-0.0072 


SD 


0.1556 


0.1415 


0.1138 


0.0993 


ASE 


0.1531 


0.1356 


0.1083 


0.0959 


MSExlO 2 


2.4226 


2.0003 


1.2997 


0.9912 


Estimate of (3% 










BIAS 


-0.0008 


-0.0001 


0.0012 


0.0017 


SD 


0.0467 


0.0425 


0.0318 


0.0297 


ASE 


0.0466 


0.0415 


0.0330 


0.0293 


MSExlO 2 


0.2182 


0.1806 


0.1013 


0.0885 


Estimate of 03 










BIAS 


0.0096 


0.0099 


0.0061 


0.0040 


SD 


0.0972 


0.0888 


0.0666 


0.0581 


ASE 


0.0947 


0.0829 


0.0670 


0.0587 


MSExlO 2 


0.9540 


0.7983 


0.4473 


0.3392 
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as long as the proportional mean function model (1.1) holds. We have also 
conducted several analytical analyses to compare the semiparametric effi- 
ciency between the maximum pseudo-likelihood and maximum likelihood 
estimation methods. There is considerable evidence that the maximum like- 
lihood method (based on the Poisson process assumption) is more efficient 
than the pseudo-likelihood method both on and off the Poisson model, with 
large differences occurring when K is heavily tailed. The detailed analytical 
results are presented in Wellner, Zhang and Liu [23]. 

4.2. A real data example. Using the semiparametric methods proposed 
in the preceding sections, we analyze the bladder tumor data extracted from 
Andrews and Herzberg ([1], pages 253-260). This data set comes from a 
bladder tumor study conducted by the Veterans Administration Coopera- 
tive Urological Research (Byar, Blackard and Vacurg [2]). In the study, a 
randomized clinical trial of three treatments, placebo, pyridoxine pills and 
thiotepa instillation into the bladder was conducted for patients with super- 
ficial bladder tumor when entering the trial. At each follow-up visit, tumors 
were counted, measured and then removed if observed, and the treatment 
was continued. The treatment effects, especially the thiotepa instillation, on 
suppressing the recurrence of bladder tumor have been explored by many 
authors, for example, Wei, Lin and Weissfeld [19], Sun and Wei [14], Wellner 
and Zhang [21] and Zhang [26]. 

In this paper, we study the proportional mean model that has been pro- 
posed by Sun and Wei [14] and Zhang [26], 

(4.5) E{n{t)\Z} = Ao(t) exptftZi + /3 2 Z 2 + f3 3 Z 3 + ^Z 4 ), 

where Z\ and Z 2 represent the number and size of bladder tumors at the be- 
ginning of the trial, and Z 3 and Z^ are the indicators for the pyridoxine pill 
and thiotepa instillation treatments, respectively. We choose = (0, 0, 0, 0) 
to start our iterative algorithm and r) = 10 -10 for the convergence criteria to 
stop the algorithm. Since the asymptotic variances are difficult to estimate, 
we adopt the bootstrap procedure to estimate the asymptotic standard error 
of the semiparametric estimates of the regression parameters. We generated 
200 bootstrap samples and calculated the proposed estimators for each sam- 
pled data set. The sample standard deviation of the estimates based on these 
200 bootstrap samples is used to estimate the asymptotic standard error. 
The inference based on the bootstrap estimator for asymptotic standard er- 
ror is given in Table 3. The semiparametric maximum pseudo-likelihood and 
maximum likelihood estimators of the baseline mean function are plotted in 
Figure 2. 

Both methods yield the same conclusion that the baseline number of 
tumors (the number of tumors observed when entering the trial) signifi- 
cantly affects the recurrence of the tumor at level 0.05 (p-value = 0.0105 
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and 0.0078, resp., for the maximum pseudo-likelihood and maximum like- 
lihood methods), and the thiotepa instillation treatment appears to reduce 
the recurrence of tumor significantly, (p-value = 0.0186 and 0.0269, resp., 
for the maximum pseudo-likelihood and maximum likelihood methods). 

In Figure 2, we can see that the maximum likelihood estimator of the 
baseline mean function is substantially smaller than the maximum pseudo- 
likelihood estimator, which preserves the phenomenon we have observed in 
nonparameteric estimation methods for this data set studied in Wellner and 
Zhang [21]. 

We also notice that the maximum likelihood method, in contrast to what 
we have observed through both the simulation and analytical studies, yields 
larger standard errors compared to the pseudo-likelihood method. Violation 
of the proportional mean model (4.5) for this data set could be the expla- 
nation for this result, since Zhang [27] plotted the nonparameteric pseudo- 
likelihood estimators of the mean function for each of three treatments and 
found that the estimators cross over. While plotting the nonparametric es- 
timators of the mean function for the groups defined by covariates is a rea- 
sonable first step in an exploration of the validity of the proposed model, it 
would be preferable to proceed via more quantitative measures, such as ap- 
propriate goodness-of-fit statistics. The construction of goodness-of-fit test 
statistics for regression modeling of panel count data remains an open prob- 
lem for future research. 

All the numerical experiments in this paper were implemented in R. The 
computing programs are available from the second author. 



Table 3 

Semiparametric inference for the bladder tumor study based on 200 bootstrap samples 

from the original data set 



Variable 


Method 





se0) 


$/se(0) 


p- value 


Zi 


Pseudo- likelihood 


0.1446 


0.0565 


2.5593 


0.0105 




Likelihood 


0.2069 


0.0778 


2.6594 


0.0078 
















Pseudo- likelihood 


-0.0450 


0.0632 


-0.7120 


0.4746 




Likelihood 


-0.0355 


0.0861 


-0.4123 


0.6801 


z 3 














Pseudo- likelihood 


0.1951 


0.3233 


0.6035 


0.5462 




Likelihood 


0.0664 


0.4310 


0.1541 


0.8775 


z 4 


Pseudo- likelihood 


-0.6881 


0.2923 


-2.3541 


0.0186 




Likelihood 


-0.7972 


0.3603 


-2.2126 


0.0269 
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Fig. 2. The two estimators of the baseline mean function under the proportional mean 
model for the bladder tumor example. 



5. Asymptotic theory: proofs. We use empirical process theory to study 
the asymptotic properties of the semiparametric maximum pseudo-likelihood 
and maximum likelihood estimators. The proof of Theorem 3.1 is closely re- 
lated to the proof of Theorem 4.1 of Wellner and Zhang [21]. The rate of 
convergence is derived based on the general theorem for the rate of con- 
vergence given in Theorem 3.2.5 of van der Vaart and Wellner [18]. The 
asymptotic normality proofs for both (3^ and (5 n are based on the general 
theorem for M-estimation of regression parameters in the presence of a non- 
parametric nuisance parameter, which is stated (and proved) in Section 6. 

Proof of Theorem 3.1. Zhang [26] has given a proof for the first part 
of the theorem concerning the semiparametric maximum pseudo-likelihood 
estimator. Unfortunately, his proof of Theorem 1 on pages 47 and 48 is not 
correct (in particular, the conditions imposed do not suffice for identifiability 
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as claimed). Here we give proofs for both the maximum pseudo-likelihood 
and maximum likelihood estimators. 

We first prove the claims concerning the pseudo-likelihood estimators 
0p s , AP S ). Let MP S (9) = n _1 / n (/3, A) = ¥ n m p e s (X) and MP S (6) = Pm p '{X), 
where 

K 

m p \X) = 5^{Njg log A Kj + N KJ P T Z - A Kj exp(p T Z)}. 

3=1 

First, we show that M ps has 9q = (A)> Ao) as its unique maximizing point. 
Computing the expectation conditionally on (Z, K,T K ) yields 



MP s (e ) -M ps (6) = / A{u)exp{p T z)h 



A (u)exp(/?^z) 



dv\(u, z), 



A(u) exp(/3 T z) 

where h(x) = xlog(x) — x + 1. The function h(x) satisfies h(x) > for x > 
with equality holding only at x = 1. Hence MP S (9 ) > MP" (6) and MP S ((9 ) = 
M ps (6) if and only if 

/c-n A (u)exp(/ffz) 

(5.1) t , , = 1 a. e. with respect to ^i. 

A(it) exp(p J z) 

This implies that 

(5.2) P = fa and A(u) = Aq(u) a.e. with respect to fj,\ 
by C2 and C7. Here is a proof of this claim: Let 

h{u) = A{u) - A (u), / 2 (u) = A (u), 

h\(z) = exp(/? T z), ^2(2) = ex.p(p T z) — exp(/3(f z). 

Then (5.1) implies that Aq(u) exp(/3^z) = A(u) exp(/3 T z) a.e. ^1, or, equiva- 
lently 

= {A(u) - A (n)}e /?Tz + A (u)(e^ z - e^ z ) 

= h{u)h\(z) + f 2 {u)h 2 (z) a.e. ut. 

Since fi\ x H is absolutely continuous with respect to z^i by assumption C2, 
equality holds in the last display a.e. with respect to ji\ x H. By multiplying 
across the identity in the last display by ab, integrating with respect to the 
measure [i\ x H, and then applying Fubini's theorem, it follows that 

= y fiad/^i J h\bdH + J f2ddfii J h2bdH 

for all measurable functions a = a(u) and b = b(z). The choice of a = f%lA 
for A E B\ and b = h\\B for B E 23^ yields 

= y ifU^i / h\l B dH + J hhUdin I hMsdH] 
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the choice of a = /2IA (for the same A G B±) and b = fi2l B (for the same set 
B G B d ) yields 

= 1 /i/ 2 U^i J h 1 h 2 l B dH + J fll A dfii J h\\ B dB. 



Thus we have 



J fll A dm J h\l B dH = - J hhl A dm J h x h 2 \ B dli 



for all A G B\ and B G Bd- By Fubini's theorem, this yields 

/ fth\ d(jn x H) = [ f 2 h\ d(fn x H) 
JAxB JAxB 

for all such sets A, B. But this implies that the measures 71 and 72 defined by 
jj(A x B) = J AxB fjhj d(fj,i x H), j = 1, 2, are equal for all the product sets 
Ax B, and hence, by a standard monotone class argument, we conclude that 
7i = 72 as measures on ([0,r] x M d ,£>i[0,T] x Bd). It follows that f\{u)h\(z) = 
f2{u)h\(z) a.e. with respect to /ii x H. Thus we conclude that 

fi(u) _ hl(z) 



fi(u) hj(z) 
or, in other words, 



a.e. on {(u, z) : f((u) > 0, h{{z) > 0}, 



(^)- 1 ) 2 = (1 - exp(( *-^ ))2 

a.e. with respect to /ii x ii". This implies that (5.2) holds in view of C7. 
Integrating across this identity with respect to m yields 

A(u) 



Ao(u) 



l) dm(u) = (1 - exp((/5 - /3) T z))Vi([0,r]) 



a.e. ff, 



and hence the right-hand side is a constant a.e. H. But this implies that 
j3 = (5q in view of C7. Combining this with the last display shows that (5.2) 
holds. 

For any given e > 0, let 0> = 0% s , (l- £ )A^+eA ) = (/3£ s , A^)+e(0, A - 
AP S ). Since MP S (^ S ) > Mp s (0p s ) = M£ s (0P s + e(0, A - A? 5 )), it follows that 

> lim Mr( ^ + £(0 ' A ° " Ar)) " M «^) 
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where A p n s Rj = A ps (T KJ ). This yields 



A 



.j=l ^ A nKj 



< 



'n[Y,^Kj+AoKjexp(pP sT Z)) 



by C1-C3 and the strong law of large numbers. (Here C represents a con- 
stant. In the sequel C appearing in different lines may represent different 
constants.) The limit on the right-hand side is finite. On the other hand, 



lim sup P n 

n— >oo 



K 



A 



£K;f^ + A^x P (/r^) 

_j=l ^ A nKj 

> hmsupP ^(TK^Kh ™M ST Z)) 

n.^co {. =1 J 

> Chmsup A^(6)P„ (j2 Mb,r](TK, 3 )) 

n.^co \ j=1 ) 

= C lim sup A£ s (%i ( [M ) • 

n— >oo 

Hence A ps (t) is uniformly bounded almost surely for t 6 [0, b] if Hi([b, r]) > 
for some < 6 < r or for i G [0,t] if /^i({t}) > 0. By the Helly selection 
theorem and the compactness of 1Z x J 7 , it follows that 6 ps = ((3 ps ,A ps ) has 
a subsequence 6r? = (/3^f, A^f) converging to # + = (/3 + , A + ), where A + is an 
increasing bounded function defined on [0, b] for a 6 < r and it can be defined 
on [0, r] if Hi({t}) > 0. Following the same argument as in proving Theo- 
rem 4.1 of Wellner and Zhang [21], we can show that M ps (9 + ) > M ps (0 o ). 
Since M ps (6 ) > M ps (9 + ), by the argument above (5.1), we conclude that 
M ps (6+) = M pa (0 o ). Then (5.2) implies that (3+ = O and A+ = A a.e. in f i 1 . 
Finally, the dominated convergence theorem yields the strong consistency of 
ps ,A ps ) in the metric di. 

Now we turn to the maximum likelihood estimator. Let M n (#) = n _1 l n (/3, A) 
F n m e (X) and M(0) = Pm e (X), where 



A 



(X) = AN*, log AAAj- + AN^/? T Z - AAk)' exp(/? T Z)}. 



j'=i 
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Much as in the pseudo-likelihood case, M has 9q = (A),Ao) as its unique 
maximum point, and 

(3 = Po and A(y) - A(u) = A (v ) - A (u) 

(5.3) 

a.e. with respect to \i2- 

The proof of consistency then proceeds along the same lines as for the 
pseudo-likelihood estimator; see Wellner and Zhang [22] for the detailed 
argument. The upshot is that (f3 n ,A n ) is almost surely consistent in the 
metric d,2- □ 

Proof of Theorem 3.2. We derive the rate of convergence by checking 
the conditions in Theorem 3.2.5 of van der Vaart and Wellner [18]. Here we 
give a detailed proof for the first part of the theorem, and for the second we 
point out the differences in the proof from the first. Let 

K 

mf (X) = £{Na3 log A(T Ktj ) + N Kj f Z - A{T K)J ) exp((3 T Z)} 
i=i 

with N K j = N(Ta-j) and M ps (9) = Pmf (X). We have 
U ps (9 ) -U ps (9) 

K 

E (Z,K,T K ) 



I A(7> J )exp(/3 i Z) 



Lj=l 

since h(x) > (1/4) (x - l) 2 for < x < 5, for any 9 in a sufficiently small 
neighborhood of 9q 

M ps (e )-M ps (9) 

K 



(5-4) > - A E { 



k(T \ r~n(RT ^ f M t k, ) exp(/3 Z) 
I A(T ft - J )exp(/3 7 Z) 



■i=i 

T 



>C {A(u)e p z -A Q (u)e^ z } 2 dv 1 {u,z) 



by CI, C2 and C6. 

Let flr(t) = A t (U) exp(ffi Z) with A f = iA+ (l-i)A and f3 t = t/3 + (l-t)(3 
for < t < 1 with (17, Z) ~ n. Then A(C7) exp(/3 T Z) - A (U) exp($fZ) = 
g(l) — 5(0) and hence, by the mean value theorem, there exists a < £ < 1 
such that <?(1) — g(0) = </(£)• Since 

= exp(/f Z)[(A - A )(C/) + {A + £(A - A )}(C/)(/3 - ft) T Z] 
= exp(/f Z)[(A - A )(C/){1 + £(/3 - PofZ} + ((3- (3 f ZA (U)], 
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from (5.4) we have 
P{m%{X)-ml s {X)} 

>cj[(A- Ao)(«){l + £(/3 - Pofz} + (P- (3 ) T zA (u)] 2 dv x (u,z) 

= v\{gih + g 2 } 2 , 

where gi(U,Z) = ((3- (3 ) T ZA (U), g 2 (U) = (A - A )(C/) and h(U,Z) = 
1 + £(A — Ao)(C/)/Ao(C/) in the notation of Lemma 8.8, page 432, van der 
Vaart [17]. To apply van der Vaart's lemma we need to bound {vi(gig 2 )] 2 
by a constant less than one times v\{g\ )v\{g 2 )- For the moment we write 
expectations under v\ as E\. But by the Cauchy-Schwarz inequality and 
then computing conditionally on U we have 

[Eiigm)} 2 

= {E^g^U)}} 2 < E^gDE^EMU)] 2 } 
= Ei{gl}Ei{kl{U)[Ei(((3 - Po) T Z\U)] 2 } 

= E^gDE^kliU^P - p f(Z - (Z - E^ZIU)))® 2 ^ - {3 )\U}} 
< (1 - ^E.igDE^mp - fa) T E Y {ZZ T \\J\j3 - (3 ) T } 
= (l-i 1 )E 1 {g 2 }E 1 {g 2 }, 
where the last inequality follows from C13. By van der Vaart's lemma, 
Mdih + g 2 } 2 > C{ Vl {g 2 ) + u^g 2 )} 

= C{\/3 - (3 \ 2 + ||A - A ||| 2(w) } = Cd 2 (6, 6 ). 
To derive the rate of convergence, next we need to find a <t> n ((j) such that 
E sup |G„K s pO " m p 9 s (X))\ < C<t> n (a). 

di(e,6» )<<T 

We let M\{6 ) = {m p e s (X) - mf o (X) : d 1 (6, 6 ) < 5} be the class of dif- 
ferences. We shall find an upper bound for the bracketing entropy num- 
bers of this class. We also let = {A 6 T: ||A — Aollx^^) < Since F$ 
is a class of monotone nondecr easing functions, by Theorem 2.7.5 of van 
der Vaart and Wellner [18], for any e > 0, there exists a set of brackets 
[A[,A\], [A l 2 ,A r 2 ], [A l q ,A r q ] with q < exp(M/e), such that for any A G F s , 
A\(t) < A(i) < A[(i) for all t G 0[T] and some 1 < i < q, and /{AJ(«) - 
A\(u)} 2 d/j,i(u) < e 2 . (Here we use the fact that [i\ is a finite measure under 
our hypotheses, and hence can be normalized to be a probability measure.) 

For sufficiently small e > and 5 > 0, we can construct the bracketing 
functions so that A[(t) - A\(t) < 71 and A[(t) > 72 with 71,72 > for all t G 
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0[T] and 1 < i < q. Here is the proof for this claim: For any A G F<5, the result 
of Lemma 7.1 implies that Ao(t) — e\ < A{t) < Ao(t) + E\ for a sufficiently 
small E\ > [ei can be chosen as (<5/C) 2 / 3 in view of Lemma 7.1] and for all 
t £ 0[T]. For any l<i<q, there is a A € .Fa such that ||A[ — A|| i2 ( m ) <e and 

||A - A^|| i2 ( Ml ) < e, which implies that || AT - A ||l 2 ( Mi ) < e*(e* = Ve 2 + 5 2 ) 
and || A| - A |jz, 2 ( w ) < e*. By Lemma 7.1, this yields that A£(t) < A (i) + e 2 
and A-(t) > Ao(t) — £2 for a sufficiently small £2 > 0. [E2 can be chosen as 
(e*/C) 2 / 3 .} Therefore our claim is justified by letting 71 = 2s2 and 72 = 
Ao(<r) — £2, in view of C8. 

Since j3 G 7?., a compact set in R d , we can construct an e-net for 1Z, 
Pi, 02, ■ ■ ■ , f3 p with p = [(M' /e d )], such that for any G 7£, there is an s such 
that 

|/3 T Z - < e and | exp(/3 T Z) - exp(/?jz)| < Ce. 
Therefore we can construct a set of brackets for A4g(9o) as follows: 

[m£ (X), mf;; (X)] , for i = 1, 2, . . . , g; a = 1, 2, . . . ,p, 

where 

if 

"C'PO = Et N ^ logAiCTjcj) + N Kj (fiZ - e) 
i=i 

- Al(T Kj ){eMPjZ) + Ce}] - m, (X) 

and 

K 

m^(X)=Y / ^K 3 logAl(T K j)+N K3 (pJZ + e) 
3=1 

- A\(T K j){exp(J%Z) ~ Ce}] - me (X). 

In what follows, we show that \\fi, s ( x )\\p,B = ( x ) ~ m M ( X )Wp,B ^ 

Ce 2 , where || • \\p,b is the "Bernstein norm" defined by ||/||p,_b = {2P(e'^' — 
1 — |/|)} 1//2 (see van der Vaart and Wellner [18], page 324). Since 2(e x — 
1 - x) < x 2 e x for x > 0, it follows that ||/|||> B < ^(e^ 1 1/| 2 ) - Therefore, 

WfiAX)\\p,B < P(el^WI|/ M (X)| 2 ). By writing out = mf s { X ) " 

mf s s (X) , we find that 

K 

\fi, s ( X )\ < ^KK^KlogAKTKj) - logAftTjrj) + 2e)\ 
3=1 

+ ex P (& T Z) "tiKiTKj) - A\(T K j)) 
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K 



Since 
(5.5) 

we find that 



+ CeJ2(K(T K , j ) + Al(T K , j )). 



logy = log x + (x + £(y-x)) 1 (y-x) 

for < x < y, some £ G [0, 1], 



logA^T^) < logA^T^) + 72 - 1 (A[(T^) - k\{T Kj )) 

by construction of A' . Hence, by C9 and our claim above, we conclude further 
that EjLid logA[(r^)-logA^(T^)| + 2e), E jLi (K (T K ,j ) - A^ (T KJ ) ) and 
EjLi(A[(Tft-j) + A'(T^j)) are all uniformly bounded in 0[T]. More explic- 
itly, taking e 2 < 2~ 1 A (a), noting that this implies 5 < C(2 _1 A (o")) 3/2 = 
(5q S with C = (co/(24/o)) 1//2 by Lemma 7.1, and using the relations e 2 = 
(e*/C)2/3, = (e 2 + (J 2 ) 1 / 2 > 5 an d £2 < 2- 1 A (ct), we find that 

f>g K(Tkj) ~ log A{(T K ,) + 2ef 



2e 2 



-^VA (a)-e 2 



+ 2e) <4fc (l + C) 2 - 



Therefore, by arguing conditionally on (Z,K,Tk) and using CIO, 
||/, s (X)||^<P( e I^WI|/^(X)| 2 ) 



vN KK 



K 

n\ K Y J (^gK' i {T Kd )-\ogK\{T Kd )+2e) 2 
i=i 

+ exp(2/3 s T Z) 5Z(A[(r K ,j) - A^T^-)) 2 + Cs< 
3=1 

By C6, CIO and Taylor expansion for logA^T^j) at A[(Tkj) as shown 
above, we have 



i,s( X )\\ 2 p,B < C\ E (K,T K ) 



A 



E(AT(rjrj)-Aj(^)r 



+ e 2 ^ <Ce 2 . 



This shows that the total number of e-brackets for A4^(9q) will be of the 



order (M/e) d e c ' (M ' //e) and hence logA?"[](£,.M 



5{V0), II • \\P,B, 



<C(l/e). 
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We can similarly verify that P(f ps (X)) 2 < C5 2 for any ff{X) = mf (X) - 
nig^(X) 6 M.\{9q). Hence by Lemma 3.4.3 of van der Vaart and Wellner [18], 

mG n \\ MU6o) <CJ {] (S,MU9 ),\\-\\ P>B ) 
where 

J^MUOq), II • \\p,b) = j \fi + ^gN [] {e,M}{eo),\\ ■ \\p, B )de 

= C J* sjl + ^de < cfe-Wfe < C5 1 / 2 . 

Hence <f> n {5) = 5 1/2 {l + 8 1 / 2 /{8 2 y/n)) = 8 l l 2 + 8~ 1 /^i. Then it is easy to see 
that cj) n {5)/5 is a decreasing function of 5, and n 2//3 </> n (n -1//3 ) = n 2 / 3 (?i -1 / 6 + 
n 1 / 3 n~ 1 / 2 ) = 2yfn. So it follows by Theorem 3.2.5 of van der Vaart and 
Wellner [18] that 

n 1 / 3 d 1 (( / 3^,A^),(/3 ,Ao)) = Op(l). 

For the maximum likelihood estimator (/3 n ,A n ) the proof of the rate of 
convergence result as stated in Theorem 3.2 proceeds along the same lines 
as the rate result for the maximum pseudo-likelihood estimator given above, 
but with 9l (U, V, Z) = {p- (3 Q ) T ZAA (U, V), g 2 (U, V) = (AA - AA )(C7, V) 
and h(U,V,Z) = 1 + £(AA - AA ){U,V)/AA (U,V) in the application of 
van der Vaart's Lemma 8.8. For details see Wellner and Zhang [22]. □ 

Proof of Theorem 3.3. We give a detailed proof for the first part of 
the theorem, and only outline the differences in the proof for the second. 
We prove the theorem by checking the conditions A1-A6 of Theorem 6.1. 
Note that Al holds with 7 = 1/3 because of the rate of convergence given in 
Theorem 3.2. The criterion function with only one observation is given by 
m'P s ((3, A;X) = Ef=i{^Kj logA Kj + N^/3 T Z - e^ z A Kj }, and thus we have 

K 

mf (AA;X) = J2z(^Kj-HT Kij )exp((3 T Z)), 
3=1 

m p 2 s (P,A;X)[h] =pJ^- eX p(jFz)y Kj , 

K 

m&(J3,A;X)[h] = - ]T A Kj ZZ T exp(f Z), 

3=1 

K 

m^(P,A;X)[h]=m p 2 ((f3,A;X)[h]=-Y,ZeMP T Z)h K3 

3=1 



1 + 



\P,B) 
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K 



m 



((3,A;X)[h,h] = -J2^ l ^K j hK: 



A 2 

3=1 A KJ 



vr 



where Axj = A(Tkj) and hxj = Jo ' j h(t) dA(t) for h £ L2(A). A2 automat- 
ically holds by the model assumption (1.1). For A3, we need to find an h* 
such that 

S^((3o,M)[h]-S^(p ,A )[h\h} 

= P{mll (fa ,A ;X)[h}- m& (fa , A ; X) [h* , h] } = 0, 
for all h G L 2 (A ). Note that 

P{m{l (Po,A ;X) [h] - m% (/% , A ; X) [h* , h) } 



K 



MY, 

ij=i 



(AojoO 



2 h Kj 



h K j 



(K,T K ,Z) 



lj=l 



A, 



/lie, ^ . 



Therefore, an obvious choice of h* is 



h* Kj = A m E(Ze^ Z \K, T K j)/E(eh Z \K, T K J) = A 0Kj R ps (K, T K> 
Hence 

m^(/3 ,Ao;A) = mf(/3 ,A ;A)-mr(/3o,A ;A)[h*] 



J2\z(N K] -e^ z A 0K] ) 

3=1 1 

^-e^^Ao^^(if,r KJ )) 
■Oifi / J 



A" 



= E( N ^ " e/3 ° ZA o^)[^ - ^(if.Tjfj)], 

3=1 

A p s = -S P M, A ) + 5^(A), A )[h*] 

, | £ A 0Kj S Z [Z-R? S (K, Tkj ) } Z T j 
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E (k,t k ,z) | E A o^ ^ Z [Z ~ R ps (K, Tkj)]* 2 I 



and 

b ps =Em* ps (p ,A ;X)® 2 



E (k,t k> J E C^(Z)[Z-R ps (K,T KJ )][Z-R ps (K,T K j)f\, 



with 



C|;,(Z) = £?[(Nxj - effZAoKjWKj' ~ e^ z A 0Kj ,)\Z,K,T K ^T KJ ,]. 
To verify A4, we note that the first part automatically holds, because 
SZ(fa s ,K s ) =V n m{\fa s M n s -X) = 
since /?^ s satisfies the pseudo-score equation. Next we shall show that 

(5.6) =Pn E T^i N ^ " k lh eMP P n sT Z)}A 0Kj R ps (K,T K>j ) 

-j=l A nKj 

= 0p (n- 1 /2) 

with A p n s Kj =AP s (T KJ ). Since 0g maximizes F n m p e s (X) over the fea- 
sible region, consider a path 8 e = (/3^ s , +e7i) for h£ T. Then 

E TS-{NJ« " Kk 3 exp(t T Z)}h Kj ] = 0. 

_j=l A nKj 

Now choose fc^ = A p n s Kj E(Z exp(f3% Z)\K,T KJ )/E(exp(pT Z)\K,T K)J ). Then 
to demonstrate (5.6), it suffices to show that 

1 = P " [E T^T^i " ^ exp(/3r T ^)}(Ao^ - K: K] )a K] 

_j=l A nKj 

= op(n" 1 / 2 ), 

where a K j = E(Z exp(P$Z)\K,T K j)/E(exp($ Z)\K,T K j). But I can be 
decomposed as I = I\ — I2 + I3 , where 

h = (Pn - P){E ^(Aoir, " A^)a*l 

lj=l A nA? J 
A 



lim-^mf (X)=F r 



^2 = (Pn - P)|Eexp(/3r T ^)(A A J - K s Kj )a K ^ 
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K 



E J^i^J ~ eMP P n ST Z)}(A 0Kj - Al s Kj )a Kj 



lj=l lv nKj 

We show that I\, I2 and I3 are all op{n~ 1 / 2 ). Let 



A 



01 (X; A) = V — - ( Aoifj - A Kj )a Kj , 
3=1 A V 
K 

<h(X;@,A) = ^2exp((3 T Z)(A OKj - A Kj )a K j, 

and define two classes $1(7?) and $2( 7 ?) as 

$i(7?) = {0i:Ae^and ||A - A || L2 ( w) < ??} 

and 

$2(7?) = W2 :W,A)eKxT and di((/3,A), (A),A )) < ??}. 

Using the same bracketing entropy arguments as used in deriving the rate of 
convergence, it follows that both and &2(v) are P-Donsker classes un- 

der conditions CI, C6 and C8. Moreover, for the seminorm pp(f) = {P(f — 
Pf) 2 } 1/2 , under conditions CI, C6, C8 and C9, we have sup^g^^) pp{4>i) — >■ 
and sup^ 2g $ 2 ( r? ) pp{4>2) —> if rj — > 0. Due to the relationship between 
P-Donsker and asymptotic equicontinuity (see Corollary 2.3.12 of van der 
Vaart and Wellner [18]), this yields I\ = op(n -1 / 2 ) and h = op{n~ 1 / 2 ). For 
I3, we have 



h 



^ {Njq-A&expQ^Z)} 
■i=i 



(A 0Kj - A p n s Kj )a K j 



E 



E 



■ K 

E 

■i=i 



nKj 



E 

Li=i 

<Cd?{(/3r,A^),(/3 ,A )}, 



nKj 



-{A 0K j - A p n s Kj )a Kj 



by performing Taylor expansion of exp(/3 T Z) at /3q along with conditions 
CI, C6, C8 and the result of Lemma 7.1. Finally the rate of convergence 
yields I3 < Cn -2 / 3 in probability and thus ^3 = op(n~ 1 ^ 2 ). 
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To verify A5, we note that 



(5.7) 



and 



(51 



Let 



K 



lj=i 



MSZ ~ S p 2 s )((3,A)[h*] - yfr(S& - 5f)(/?o, A )[h* 

/ K 



\j=l 



N Kj N Kj 



A 



Kj AoiQ 



K 



a(/3, A;X)=J2 Z{K QK] exp(^Z) - exp(/? T Z)} 



and 



K 



b(P,A;X) = J2 



1 



1 



{expiP 1 Z)-exp($Z)} 



,Axj A Kj 

For a ij > 0, we define 

Afa) = {o(/3, A; A) A), (ft, A )} < r, and (/?, A) £ ?l x JF} 

and 

= {609, A; X) :d x {{p, A), (#,, A )} < f] and (/?, A) e ft x JF}. 

Then by applying the bracketing entropy arguments as in the rate of con- 
vergence proof, we can show that both A(rf) and B(rf) are P-Donsker classes 
under conditions CI, C6 and C8 and for a small enough r/ > 0. We can also 
show that sup aeAM p P {a((3,A;X)} -> and sup 6eB(r?) p P {b(/3, A; A)} -> if 
rj — > 0. Then the rate of convergence along with Corollary 2.3.12 of van der 
Vaart and Wellner [18] yields that 

sup \G n a(j3,A;X)\ = o P (l) 

|/9-/8o|<CTn,||A-A ||<Cn- 1 /3 



and 



sup \G n b(f3,A;X)\=o P (l). 

l/3-/3 |<(7n,||A-Ao||<Cn-i/3 



Hence A5 holds with 7 = 1/3. 
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Finally, to verify A6, performing Taylor expansion of mf s (/3, A;X) at the 
point (/?o,Ao), we have 

mf (J3,A:X) 

K 

= £z{N^ - A Kj exp(p 7 Z)} 

= mf(/3o,Ao;X) + m^(/3o,Ao;X)(^-^o)+m^(/3o,A ;X)[A-Ao] 

K 

- ]T exp(/3 T Z)ZZ T (/3 - o )(A Kj - A 0Kj ) 

K 

- \ Zexp(pJZ)A 0K3 ((3 - (3 ) T ZZ T ((3 - (3 ), 

where /?| = A) + — A)) for some < £ < 1. This yields 

|5f(/3, A) - sr(Po, A ) - 5^ (A), A )(/3 - A)) - 5f 2 s (/3 , A )[A - A ]| 



(5.9) 



K 



P ^exp(/3 T Z)ZZ T (/3 - [3 )(A Kj - A 



OKj, 



lj=l 



K 



+ i ^ Zexp(/f Z)A ^(/3 - fa) T ZZ T {P - A)) 



Similarly, we have 



(5.10) 



(A A) [h*] - (A), A ) [h*] - S£ (A), A )[h*] (/? - Aj) 



Sf 2 s (/?o,A )[h*,A-A 



(A_R-j — ApKj] 



A 



A' 



- ^ Zexp(/f Z)(/3 - f5 ) T ZZ T {(3 - A,) 



where = A) + C(/3 — A)) and A^j = A Aj + C(A/q - A Aj) for some < 
C < 1. Hence by CI, C3, C6, C7 and C8, it follows that (5.9) and (5.10) 
< C{\P - P \ 2 + II A- Ao||| 2 ( M1 )}, so A6 holds with a = 2 and thus the proof 
for the first part of Theorem 3.3 is complete. 
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For the second part, first we note that with a single observation, m(j3, A; X) 
EjLi{ANiQ log AAjcj + ANKj/FZ - e^ z AA Kj }, and hence 

K 

mi (J3,A;X) =J2Z[A^ Kj - AA Kj eP Tz ], 
f r ai^ „^„i 

Ah K j , 



m 2 (P,A;X)[h]=J2 
3=1 



AA 



Kj_ _ e pT Z 
Kj 



K 



m 11 ((3,A;X) = -J2^KjZZ T e> 3Tz , 

3=1 

K 

m 12 {p, A; X) [h] = m T 21 ((3, A; X) [h] = - £ Ze^ z , 



a - 



m 22 (/?,A;X)[h,/i] = - ^ f f^ Ah^A/^, 



where Ahxj = Jt k j 1 /idA for h £ L 2 (A). Al holds with 7 = 1/3 and the 
norm || • || being L 2 (fi 2 ) because of the rate of convergence established in 
Theorem 4.2. A2 holds by the model specification (1.1). For A3, we need to 
find an h* such that 

5 12 (/3o,Ao)^]-5 22 (/3o,Ao)[h*,/ i ] 

= P{m 12 (ft , A ; X) [h] - m 22 (ft , A ; X) [h* , h]} = 0, 

for all h G L 2 (Ao). Note that 

P{m 12 (ft , Ao ; X) [h] - m 22 (ft , A ; X) [h* , h] } 



(AAojtj) 



2 Ah ^ 



Ah 



A r P P'o z Ah* 



AA 



OKj 



Kj 



Ah 



Kj 



-- E (K,T K ,Z)\Y1 

Therefore, an obvious choice of h* satisfies 

Ah^ = AA 0K jE(Ze l3 o z | K, T Kij _ x , T Kd )/E{e^ z \ K, T K j_ lt T KJ ) 
= AA KjR(K, Tkj-i,Tkj). 

Hence 

m*(ft , A ; X) = mi (ft, A ; X) - m 2 (ft , A ; X) [h*] 
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£;{z(ANjg-e#*AAojo-) 

3=1 L 



and 



K 

= AN K j - e^ z AA 0Kj )[Z - R{K,T K j_ x ,T K j)}, 
A = -5 11 (/3 ,Ao) + 5 2 i(/3o,A )[h*] 

= E (K,T K ,Z) |E AA ^e^[Z - T KJ -i, T Kyj )]Z T ^ 

= E {K ,T K ,z)\^p^K^ z [Z - R(K, T K j_i,T K j)]® 2 1 

J B = ^m*(/3 ,A ;Xf 2 

= E ( k,t k ,z){ E ~ R{K ,T K j_i,T K j)] m \ 



with 



\Z,K, Tkj-i, T K j , T K ji_i, T K j>] . 

The rest of the proof of the maximum likelihood part of Theorem 3.3 par- 
allels the proof for the pseudo-likelihood estimator; see Wellner and Zhang 
[22] for the details. □ 

6. A general theorem on the asymptotic normality of semiparametric 
M-estimators. In this section, we present a general theorem dealing with 
the asymptotic normality of semiparametric M-estimators of regression pa- 
rameters when the rate of convergence of the estimator for nuisance parame- 
ters is slower than n -1 / 2 . We consider a general setting of a semiparametric 
model: given i.i.d. observations X\,X2, ■ ■ ■ ,X n , we estimate unknown pa- 
rameters (/?, A) by maximizing an objective function n" 1 Ya=i m (@i A; Xj) = 
P n m(/3, A; X), where f3 is a finite-dimensional parameter and A is an infinite- 
dimensional parameter. If m happens to be the log-likelihood function based 
on a single observation, then the estimator is simply the semiparametric 
maximum likelihood estimator. Our Theorem 6.1 here generalizes Theorem 
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6.1 of Huang [7] to accommodate the situation when the model is misspeci- 
fied for the observed data. For a misspecified model, the information matrix 
calculated in Huang [7] is not relevant since (6.2) of Huang [7] is no longer 
valid. The notation we use here follows that of Huang [7]. 

Let j3 = (/?, A), where (3 G K d and A is an infinite-dimensional parameter 
in the class T . Suppose that A^ is a parametric path in T through A, that 
is, A,, G T , and A^l^o = A. 

Let H = {h : h = ^^-\^ = q] and for any h G H we define 

i a a \ ^ (ok \- fdm((3,A;x) dm(/3,A;x) 
mi(f3,A-x) = Vpm(p,A;x) = I - 



d(3i ' "' d(3 d 



i a a \ru dm((3,A v ;x 
m 2 (p,A;x^- 



dr) 

m n ((3,A;x) = V|m(/3, A; x), 
_ dmi(P,A v ;x) 



m 12 {(3,A;x JL . 
m 2 i {(3, A; x) [h] = V,gm 2 {(3, A; x 



m 22 (j3,A;x)[hi,h 2 ] 



d 2 m(P,A Vj ;x) 



dr]idr] 2 



97=0 



ry=0,j=l,2 



d 

= ^— m 2 (l3,A m ;x)[h 1 ] 

(Jf]2 



V2=0 



We also define 

St (/?, A) = Pmi (A A; X) , S 2 {(3, A) [h] = Pm 2 ((3, A; X) [h] , 

S ln (P, A) = P n mi(/3, A; X), S 2n (/3, A) [h] = P n m 2 (/3, A; X) [h], 

Snip, A) = PmniP, A; X), S 22 {(3, A) [fc, /»] = Pm 22 ([3, A; X) [fc, h] , 

S 12 (f3,A)[h]=S 2 r 1 (f3,A)[h]=Pm 12 ([3,A;X)[h}. 

Furthermore, for h = (hi, h 2 , . . . , h d ) T G H d , where /ij G H for j = 1, 2, . . . , d, 
we denote 

m 2 (/3,A;x)[h] = (m 2 ((3,A;x)[hi], . . . ,m 2 ((3,A;x)[h d ]) , 
mi 2 ((3, A;x)[h] = (rai 2 (/3, A;x)[h\], . . . ,mi 2 (f3,A;x)[h d }), 
m 21 ((3,A;x)[h] = (m 2 i(A A;x)[hi\, . . . ,m 2 i(/3, A;x)[h d \) , 
m 22 (/3, A;x)[h,h] = (m 22 A; x) [hi , h] , . . . ,m 22 (/?, A;x)[h d ,h\) , 
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and define 

S 2 (/?,A)[h] =Pm 2 (P,A;X)[h], S 2n (A, A)[h] = P„m 2 (A A;X)[h], 
5i 2 (A A) [h] = Pm 12 (A, A; X) [h] , S 21 (A A) [h] = Pm 2l (A A; X) [h] , 

S 22 (A A) [h, fc] = Pm 22 (A A; X) [h, h] . 

To establish the asymptotic distribution for the M-estimator j3 n , we need 
the following assumptions: 

Al. \(3 n — (3q\ = o p (l) and ||A n — Ao|| = O p (n~ 7 ) for some 7 > and some 
norm || • ||. 

A2. Si (/9b, A ) = and S 2 (/3 , A )[/i] = for all kH. 
A3. There exists an h* = (h*, . . . , h* d ) T ', where hlj G H for j = 1, . . . , d, such 
that 

S 12 (f3 ,A )[h] - S 22 (f3 ,A )[h*,h}=0, 
for all h G H. Moreover, the matrix 

^ = -Sii(/?o,A ) + S 2 i(/3o,A )[h*] 

= -P(mii(/3o,A ;X)-m 2 i(/3o,A ;X)[h*]) 

is nonsingular. 
A4. The estimator (/3 n ,A n ) satisfies 

S ln (/3 n ,A n )= 0p (n- 1 / 2 ) and S 2n (/L A n )[h*] = o P (n-V2). 

A5. For any 6 n | and C > 

sup |Vn(Si„ - Si) (A, A) - s/n{S Xn - Si)(Ao,A )| 

l/3-/3ol<5„,||A-A ||<C*n-7 

= op(l) 

and 

sup |y^(S 2n -S 2 )(/3,A)[h*] 

|/3-A)|<*n,||A-A ||<Cri-T 

" V^(S 2n - S 2 )(A), A )[h*]| = o P (l). 
A6. For some a > 1 satisfying 07 > 1/2, and for (A A) in a neighborhood 
of (ft, A ) : {(J3, A) : |/3 - Ao| < <5„, ||A - A || < Cn" 7 }, 

|Si(/3, A) — Si (A, , Ao) - Su (J3q , A ) (J3 - Ao) - S 12 (ft ,A )[A — A ] | 
= (|A-Ao|) + 0(||A-A |r) 

and 

|S 2 (A,A)[h*]-S 2 (A),A )[h*] 
- S 2 i (/Jo, Ao)[h*] (A - A)) - ^(Ao, A ) [h* , A - A ] | 
= (|A-Ao|) + 0(||A-A |r). 
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Theorem 6.1. Suppose that assumptions A1-A6 hold. Then 
yfaiPn - A)) = A' 1 v^P„m*(/3 , A ; X) + o p * (1) -> A(0, A~~ 1 B(A~ 1 ) T ), 

a 

where m*(/3 ,A ;x) = mi(/3o,A ;x) - m 2 (Ao, A ;x)[h*], B = Em*(/3 ,A ; 
X)® 2 = E(m*(P , A ; A)m*(A>, A ; X) T ), and A is given in assumption A3. 

PROOF. Al and A5 yield 

V^(Sin - Si)0 n ,A n ) - V^(Sm - 5i)(/9b,Ao) = o P (1). 

Since Si n ($ n ,A n ) = o p *(n~ 1 / 2 ) by A4 and Si (A), A ) = by A2, it follows 
that 

VnSi0 n , An) + VnS ln (/3 , Ao) = o P (1). 

Similarly, 

VS5 2 (A»,A n ) + ^<S , 2n(A),Ao)[h*]=op(l). 
Combining these equalities and A6 yields 

5u(A), Ao)(A, - A)) + 5i 2 (/3b, A )[A n - Ao] + Sm(Ao, Ao) 

(6.1) 

+ o(|& - A)|) + 0(l|A n - Ao|D = o P (n^ 2 ) 

and 

5 2 i(A), A )[h*](/3„ - Ao) + 5 22 (A), Ao)[h*, A n - A ] 
(6.2) + S 2n (p , A )[h*] + o(|A, - Ao|) + 0(||A„ - A || a ) 

= 0p (n- 1 /2). 

Because ocy > 1/2, the rate of convergence assumption Al implies \/nO(\\A n — 
A \\ a ) = op(l). Thus by A4 and (6.1) minus (6.2), it follows that 

(5n(A),Ao) - S 21 (/3 ,A )[h*])(/3 n -A,) +o(|4 -Aol) 
= - (5 ln (A),Ao) - 5 2 „(A), A )[h*]) + op(n~ 1 / 2 ), 

that is, 

-(A + o(l))(/3„ -/3o) = -Pnm*(/? ,A ; A) + p(n~ 1 / 2 ). 
This yields 

v^(An-Ao) = (A + o(l))- 1 v / ^P„m*(Ao,A ;A) + p(l) 

—> A(0, A _1 i?(A _1 ) T ). n 
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7. A technical lemma. 

Lemma 7.1. Suppose that conditions C8, Cll and C12 hold, and that 
A G T satisfies ||A — Aollw^jj < r\. Then there exists a constant C indepen- 
dent of A such that 

sup lA^-Ao^l^/C) 2 / 3 . 
teO[T] 

Proof. Suppose that to G 0[T] satisfies 

|A(t )-Ao(t )|>(l/2) sup |A(t)-A (t)|=e/2. 

teO[T] 

Then either A(t ) > A (t ) + £/2, or A (t ) > A(i ) + £/2; that is, A(t ) < 
^-o(^o) — £/2- In the first case we have 



r? 2 > /{A(t)-Ao(t)FdMi(t) 

r-A ( 7 1 (C/2+A (fo)) 



> / " {A(t)-A W} 2 AiW^ 

A ( 7 1 K/2+A (i )) 



> / {A (t ) + e/2-A (t)} 2 AiW^ 

Jta 



> 



(/2+A (t ) 1 

{A (t ) + e/2 - x} 2 /i 1 (Ao 1 (x))-— — - da; 
Ao(to) A {A (x)} 

5/2+A (to) Co 



> (c //o) / {A (to) + £/2 - x} 2 dx > -^-f 

( sup |A(t) - A (i)T 3 



24/o \t£0[T] 

This yields the stated conclusion with C = ^/co/(24/o). In the second case 
the same conclusion holds by a similar argument. □ 

The result of Lemma 7.1 can be extended to the interval S[T] = (0,r) as 
long as C12 is valid on S[T] and fii(t) is uniformly bounded away from zero 
for t G S[T]. 
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